Skip to content

Conversation

@nikhilwoodruff
Copy link
Collaborator

@nikhilwoodruff nikhilwoodruff commented Jun 16, 2025

Fixes #164

After this PR, we could give users access tokens for private microdata. The process would be:

  • Create a service account and give it the permissions Storage legacy bucket reader and Storage legacy object reader in the relevant Google Cloud bucket in the project PolicyEngine research
  • Generate a JSON private key and base64-encode the contents
  • Pass this key to a client

The argument for b64 encoding is that it reduces complexity and bug-prone behaviour around string formatting etc. when pasting the contents.

@nikhilwoodruff nikhilwoodruff self-assigned this Jun 16, 2025
@nikhilwoodruff nikhilwoodruff added the enhancement New feature or request label Jun 16, 2025
@nikhilwoodruff nikhilwoodruff changed the title Add option to reach data download token from b64 SA keys. Add option to reach data download token from b64 SA keys Jun 16, 2025
Copy link
Contributor

@anth-volk anth-volk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikhilwoodruff Question for you around desired implementation.

Simplifies the dependent code and unit testing.
"""

def __init__(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question, blocking: Have you looked at all into Google Secrets Manager?

An option I've found that we could use would be, instead of having us create individual service accounts for customers, we could generate our own API keys and then use Google Secrets Manager to store these keys and pass & validate them via SHA256 encryption. This seems a bit more akin to what you're seeking to do. It also gives us more granular control over which datasets this enables, as we can store richer metadata with the key (imagine one day we create another type of limited-access dataset that we don't want these customers to access).

I have a Claude chat here describing Google Secrets Manager, as well as other options that may be less desirable (e.g., JWT tokens). The most relevant portions are toward the end. Curious to hear your thoughts.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would have to build some other service and host it to authenticate right though?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I need to understand our use case better here. I don't really want to manage service accounts for external users or force the user to authenticate using a specific mechanism.

Is there a reason we can't have them provide a gmail account or service account that they own/manage and then grant it permission to access the bucket?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is that we want to keep the process of running code that downloads the microdata simple. A single microdata access token as an env var is as simple as it gets

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add option to reach data download token from b64 SA keys

4 participants